Goto

Collaborating Authors

 stochastic block model





Exact recovery and Bregman hard clustering of node-attributed Stochastic Block Model

Neural Information Processing Systems

However, in many scenarios, nodes also have attributes that are correlated with the clustering structure. Thus, network information (edges) and node information (attributes) can be jointly leveraged to design high-performance clustering algorithms. Under a general model for the network and node attributes, this work establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model.


Information-theoretic Limits for Community Detection in Network Models

Neural Information Processing Systems

We analyze the information-theoretic limits for the recovery of node labels in several network models. This includes the Stochastic Block Model, the Exponential Random Graph Model, the Latent Space Model, the Directed Preferential Attachment Model, and the Directed Small-world Model. For the Stochastic Block Model, the non-recoverability condition depends on the probabilities of having edges inside a community, and between different communities. For the Latent Space Model, the non-recoverability condition depends on the dimension of the latent space, and how far and spread are the communities in the latent space. For the Directed Preferential Attachment Model and the Directed Small-world Model, the non-recoverability condition depends on the ratio between homophily and neighborhood size. We also consider dynamic versions of the Stochastic Block Model and the Latent Space Model.


Sparse Hypergraph Community Detection Thresholds in Stochastic Block Model

Neural Information Processing Systems

Community detection in random graphs or hypergraphs is an interesting fundamental problem in statistics, machine learning and computer vision. When the hypergraphs are generated by a {\em stochastic block model}, the existence of a sharp threshold on the model parameters for community detection was conjectured by Angelini et al. 2015. In this paper, we confirm the positive part of the conjecture, the possibility of non-trivial reconstruction above the threshold, for the case of two blocks. We do so by comparing the hypergraph stochastic block model with its Erd{\o}s-R{\'e}nyi counterpart. We also obtain estimates for the parameters of the hypergraph stochastic block model. The methods developed in this paper are generalised from the study of sparse random graphs by Mossel et al. 2015 and are motivated by the work of Yuan et al. 2022. Furthermore, we present some discussion on the negative part of the conjecture, i.e., non-reconstruction of community structures.


Streaming Belief Propagation for Community Detection

Neural Information Processing Systems

The community detection problem requires to cluster the nodes of a network into a small number of well-connected'communities'. There has been substantial recent progress in characterizing the fundamental statistical limits of community detection under simple stochastic block models. However, in real-world applications, the network structure is typically dynamic, with nodes that join over time. In this setting, we would like a detection algorithm to perform only a limited number of updates at each node arrival. While standard voting approaches satisfy this constraint, it is unclear whether they exploit the network information optimally. We introduce a simple model for networks growing over time which we refer to as streaming stochastic block model (StSBM). Within this model, we prove that voting algorithms have fundamental limitations. We also develop a streaming belief-propagation (STREAMBP) approach, for which we prove optimality in certain regimes.


Optimality of variational inference for stochasticblock model with missing links

Neural Information Processing Systems

Variational methods are extremely popular in the analysis of network data. Statistical guarantees obtained for these methods typically provide asymptotic normality for the problem of estimation of global model parameters under the stochastic block model. In the present work, we consider the case of networks with missing links that is important in application and show that the variational approximation to the maximum likelihood estimator converges at the minimax rate. This provides the first minimax optimal and tractable estimator for the problem of parameter estimation for the stochastic block model with missing links. We complement our results with numerical studies of simulated and real networks, which confirm the advantages of this estimator over current methods.


Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities (II)

Carpentier, Alexandra, Giraud, Christophe, Verzelen, Nicolas

arXiv.org Machine Learning

A fundamental theoretical question in network analysis is to determine under which conditions community recovery is possible in polynomial time in the Stochastic Block Model (SBM). When the number $K$ of communities remains smaller than $\sqrt{n}$ --where $n$ denotes the number of nodes--, non-trivial community recovery is possible in polynomial time above, and only above, the Kesten--Stigum (KS) threshold, originally postulated using arguments from statistical physics. When $K \geq \sqrt{n}$, Chin, Mossel, Sohn, and Wein recently proved that, in the \emph{sparse regime}, community recovery in polynomial time is achievable below the KS threshold by counting non-backtracking paths. This finding led them to postulate a new threshold for the many-communities regime $K \geq \sqrt{n}$. Subsequently, Carpentier, Giraud, and Verzelen established the failure of low-degree polynomials below this new threshold across all density regimes, and demonstrated successful recovery above the threshold in certain moderately sparse settings. While these results provide strong evidence that, in the many community setting, the computational barrier lies at the threshold proposed in~Chin et al., the question of achieving recovery above this threshold still remains open in most density regimes. The present work is a follow-up to~Carpentier et al., in which we prove Conjecture~1.4 stated therein by: \\ 1- Constructing a family of motifs satisfying specific structural properties; and\\ 2- Proving that community recovery is possible above the proposed threshold by counting such motifs.\\ Our results complete the picture of the computational barrier for community recovery in the SBM with $K \geq \sqrt{n}$ communities. They also indicate that, in moderately sparse regimes, the optimal algorithms appear to be fundamentally different from spectral methods.


Variational Estimators for Node Popularity Models

Karki, Jony, Huang, Dongzhou, Zhao, Yunpeng

arXiv.org Machine Learning

Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity Model (TNPM). Existing methods, such as the Two-Stage Divided Cosine (TSDC) algorithm, provide a scalable estimation approach but may have limitations in terms of accuracy or applicability across different types of networks. In this paper, we develop a computationally efficient and theoretically justified variational expectation-maximization (VEM) framework for the TNPM. We establish label consistency for the estimated community assignments produced by the proposed variational estimator in bipartite networks. Through extensive simulation studies, we show that our method achieves superior estimation accuracy across a range of bipartite as well as undirected networks compared to existing algorithms. Finally, we evaluate our method on real-world bipartite and undirected networks, further demonstrating its practical effectiveness and robustness.